智能论文笔记

Parallel Context Windows Improve In-Context Learning of Large Language Models

Nir Ratner , Yoav Levine , Yonatan Belinkov , Ori Ram , Omri Abend , Ehud Karpas , Amnon Shashua , Kevin Leyton-Brown , Yoav Shoham

分类：自然语言处理

2022-12-21

For applications that require processing large amounts of text at inference time, Large Language Models (LLMs) are handicapped by their limited context windows, which are typically 2048 tokens. In-context learning, an emergent phenomenon in LLMs in sizes above a certain parameter threshold, constitutes one significant example because it can only leverage training examples that fit into the context window. Existing efforts to address the context window limitation involve training specialized architectures, which tend to be smaller than the sizes in which in-context learning manifests due to the memory footprint of processing long texts. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows'') that fit within the architecture, restrict the attention mechanism to apply only within each window, and re-use the positional embeddings among the windows. We test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. Our results motivate further investigation of Parallel Context Windows as a method for applying off-the-shelf LLMs in other settings that require long text sequences.

translated by 谷歌翻译

重播缓冲区是许多强化学习方案中的关键组成部分。然而，他们的理论特性尚未完全理解。在本文中，我们分析了一个系统，将随机过程X推入重型缓冲区，然后随机采样以从重播缓冲区生成随机过程y。我们提供了采样过程的属性分析，例如平稳性，马尔可波和自相关，就原始过程的属性而言。我们的理论分析阐明了为什么重播缓冲液可能是良好的去率。我们的分析提供了理论工具，以证明基于重播缓冲算法的收敛性，这些算法在强化学习方案中很普遍。

translated by 谷歌翻译

当加强学习以稀疏的奖励应用时，代理必须花费很长时间探索未知环境而没有任何学习信号。抽象是一种为代理提供在潜在空间中过渡的内在奖励的方法。先前的工作着重于密集的连续潜在空间，或要求用户手动提供表示形式。我们的方法是第一个自动学习基础环境的离散抽象的方法。此外，我们的方法使用端到端可训练的正规后继代表模型在任意输入空间上起作用。对于抽象状态之间的过渡，我们以选项的形式训练一组时间扩展的动作，即动作抽象。我们提出的算法，离散的国家行动抽象（DSAA），在训练这些选项之间进行迭代交换，并使用它们有效地探索更多环境以改善状态抽象。结果，我们的模型不仅对转移学习，而且在在线学习环境中有用。我们从经验上表明，与基线加强学习算法相比，我们的代理能够探索环境并更有效地解决任务。我们的代码可在\ url {https://github.com/amnonattali/dsaa}上公开获得。

translated by 谷歌翻译